查看原文
其他

一款好用的大规模数据选择清除分析软件

生信阿拉丁 生信阿拉丁 2022-05-16

点击上面“蓝字”关注我们



1

介绍

群体选择清除(Selective Sweeps)分析是研究群体适应性的过程,计算软件和原理比较多,今天介绍一款对于单个群体大样本量的选择分析软件。SweeD基于复合似然比测试检验全基因组选择清除分析,在SweepFinder算法基础上改进,并且全面优于前者。


2

安装

下载:https://cme.h-its.org/exelixis/resource/download/software/SweeD_v3.2.1_Linux.tar.gz

tar -xzvf SweeD_v3.2.1_Linux.tar.gz
cd SweeD_v3.2.1_Linux
make -f Makefile.gcc

查看:./SweeD -help 会有参数说明

3

3、输入文件

一共支持5种输入文件格式:


1


The SweepFinder format

一共4列:

  • location: the location of a SNP (SNP位置)

  • x: the number of sequences carry the derived allele for a SNP (derived allele SNP数目)

  • n: the number of valid sequences at a SNP (SNP总数)

  • folded: a binary character which denotes if the SNP is unfolded (0) or folded (1).



2


FASTA format

这个大家很熟悉了,不做多的解说。



3


ms-like format

Hudson’s ms outputs binary data (0 and 1) instead of DNA data (A, C, G, or T). Usually, state 1 is called ‘derived’ and state 0 is called ‘ancestral’.



4


MaCS-like format

MaCS [Chen et al., 2009] is a Markovian coalescent simulator.这个格式不常见这里就不做详细解读。



5


VCF format

VCF格式是我们比较熟悉的,用此格式作为输入计算,简单快捷。


4

运行命令

SweeD -name test -input input.file -grid 10000 

其中各参数如下:
-name: Specifies a name for the run and the output files. 定义一个名字

-input: Specifies the name of the input alignment file. Supported file formats: SF (Sweep Finder) format.

-grid: Specifies the number of positions in the alignment where the CLR will be computed.


5

5. 输出结果

输出两个文件:
1)information file (SweeD_Info.runName), which contains information related to the run of the program (the command line for instance). 信息文件包含运行过程相关信息。

2)report file (SweeD_Report.runName), which consists the main output file of the program (the score of the statistic at each position). 该文件就是我们要的结果文件。

主要有3列:

第一列:the alignment positions where the SweeD score is calculated 位置
第二列:the corresponding likelihood value 似然值
第三列:and the corresponding α value, which is a function of the selection coefficient, the recombination rate and the effective population size.


参考文献

Gary K Chen, Paul Marjoram, and Jeffrey D Wall. Fast and flexible simulation of dna sequence data. Genome Res, 19(1):136-142, Jan 2009. doi: 10.1101/gr.083634.108. URL http://dx.doi.org/10.1101/gr.083634.108.
Richard R Hudson. Generating samples under a wright-fisher neutral model of genetic variation.Bioinformatics, 18(2):337-338, Feb 2002.
Pavlos P , Živković Daniel, Alexandros S , et al. SweeD: Likelihood-Based Detection of Selective Sweeps in Thousands of Genomes[J]. Molecular Biology and Evolution(9):9.

作者:小龙

审稿:童蒙

排版:amethyst


不可错过的单细胞转录组研究新维度:空间转录组

◆Pacbio和Nanopore测序技术之拳王争霸

单细胞转录组高级分析介绍

单细胞转录组亚群分析

单细胞转录组(Single cell RNA)概述




您可能也对以下帖子感兴趣

文章有问题?点此查看未经处理的缓存